5 research outputs found

    Privacy-Preserving Clustering of Unstructured Big Data for Cloud-Based Enterprise Search Solutions

    Full text link
    Cloud-based enterprise search services (e.g., Amazon Kendra) are enchanting to big data owners by providing them with convenient search solutions over their enterprise big datasets. However, individuals and businesses that deal with confidential big data (eg, credential documents) are reluctant to fully embrace such services, due to valid concerns about data privacy. Solutions based on client-side encryption have been explored to mitigate privacy concerns. Nonetheless, such solutions hinder data processing, specifically clustering, which is pivotal in dealing with different forms of big data. For instance, clustering is critical to limit the search space and perform real-time search operations on big datasets. To overcome the hindrance in clustering encrypted big data, we propose privacy-preserving clustering schemes for three forms of unstructured encrypted big datasets, namely static, semi-dynamic, and dynamic datasets. To preserve data privacy, the proposed clustering schemes function based on statistical characteristics of the data and determine (A) the suitable number of clusters and (B) appropriate content for each cluster. Experimental results obtained from evaluating the clustering schemes on three different datasets demonstrate between 30% to 60% improvement on the clusters' coherency compared to other clustering schemes for encrypted data. Employing the clustering schemes in a privacy-preserving enterprise search system decreases its search time by up to 78%, while increases the search accuracy by up to 35%.Comment: arXiv admin note: text overlap with arXiv:1908.0496

    Divergence Based Non-Negative Matrix Factorization for top-N Recommendations

    Get PDF
    Personalized top-N recommendation algorithms are among the most effective techniques providing customized suggestions in information retrieval applications. Most of the current methods construct personalized recommendations based on various loss functions such as pairwise ranking loss and point-wise recovery loss. In this paper, we propose a personalized top-N recommendation method based on non-negative matrix factorization with divergence as a point-wise ranking loss function. Our method finds the latent factors from the existing data to improve recommendation predictions. We formulate the learning problem with regularized divergence as a constrained non-convex minimization problem and develop a projected gradient descent optimization algorithm to solve the divergence problem. We evaluate our approach using six personal recommendation task related datasets by employing root mean squared error (RMSE) and hit rate (HR). Our experimental results demonstrate improved RMSE and HR for most of the datasets

    Adaptive and Concurrent Garbage Collection for Virtual Machines

    Get PDF
    An important issue for concurrent garbage collection in virtual machines (VM) is to identify which garbage collector (GC) to use during the collection process. For instance, Java program execution times differ greatly based on the employed GC. It has not been possible to identify the optimal GC algorithms for a specific program before exhaustively profiling the execution times for all available GC algorithms. In this paper, we present an adaptive and concurrent garbage collection (ACGC) technique that can predict the optimal GC algorithm for a program without going through all the GC algorithms. We implement this technique in the Java virtual machine and test it using standard benchmark suites. ACGC learns the algorithms’ usage pattern from different training program features and generates a model for future programs. Feature generation and selection are two important steps of our technique, which creates different attributes to use in the learning step. Our experimental evaluation shows improvement in selecting the best GC. Additionally, our approach is helpful in finding better heap size settings for improved program execution
    corecore